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Fingezpnut extractioiL 



EELD OP THE INVENTION 

Tbe invention relates to a method and anrangement for extracting a fingerprint 
fiom azxiedia signaL 

5 BACKGROUND OF THE INVENTION 

A fwgejpriat, also often referred to as signature or hash, is a sequ^ce of bits 
that is derived from multiniedia content, e.g. an audio song, an image, a video clip, ete. 
Multimedia fingerprints are ixsed, inter alia, in the field of authentication where it is desired 
to verify whether received content is original or detect whether the content has been 

10 tampered. Fingerprints are also used to identify media content A service that is likely to 
become very popular in the near fiiture is audio identification. A fingerprint being derived 
from an unknown piece of music is s^t to a database wh^e the title, artist and other 
metadata is looked-i^ and returned to the consumer. 

A known method of extracting a fingerprint fix«n a media signal is disclosed in 

15 Applicant's international Patent Application WO 02/065782. A schematic diagram of this 
prior art method is shown in Fig. 1, The media signal Qiere an audio song) is divided into 
overliapping frames (161). A spectral lepresentatipn of each frame is obtained by per&tming 
a Fast Fourier Transform (102)« The energy of the audio signal in 33 logarithmically spaced 
sub^bands is subsequently computed (103). The bands lie in the range 300-2000Hz which is 

20 perceptually the most relevant range. The 33 energy levels constitate a secjuence of 

perceptual property samples of the respective audio signal frame. In order to be Invariant 
with respect to the absolute loudness of the audio signal, and to prevent a mqjor single audio 
frequency from producing identical sequences for successive frames* a simple 2-dimensional 
filter (104) is applied to the spectrogram prior to obtain 32 diff^endal property samples. The 

25 sequence is subsequently converted into a bit string by an cqppropriate thresholding operation 
(lOSO- More particularly^ a aub-band in a particular frame is assigned abit '1 * if the en&rgy 
diiEsg:ence with its neighboring sub-band is larger than the energy difference with its 
neighboring sub-band in previous frame . Otherwise, the fingerprint bit is 
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The known method produces a stxing of 32 bits for each aadio &ame 
( «0.4 sec). The frames are prefiarably overlapping (e,g. by a factor 3 1/32) so that the bit 
strings change slowly oygt time. This makes the fingerprint extraction invariant with respect 
to time shifting and frame boundary positioning Typically, blocks of 256 overlapping 
5 frames, Le. 256x32=8 192 bits ( ^3 sec of audio) are used to ideatif / a song. 

The prior art fingeiprint extraction method has tumed out to bo very robust 

Bg^ingf i^x^mOr^t ^ co mmonly used audio processing steps such as MP3 encoding, sample rate 

convKsion, D/A A/D conversion, equalization^ However, it is not veay robust against speed 
changes. It is quite common for radio stations to speed up audio by a few percent. They 
10 supposedly do this for two reasooas. Firstly the d\Jratbn of songs is then shorter and therefore 
it enables them to broadcajst more commercials. Secondly the beat of the song is faster and 
listeneors seem to prefer fEds. The speed change typically lie between zero and four percent 
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OB JEC5T AND SUMMARY OF THE INVENTION 
15 It is an object of the invention to provide an in^roved method and 

axtan^ment for extcacliog a fbgetprintftom a media signal 

To this end, tbe method in accordance with the invention comprises the steps 

of deriviog firom said media signal a sequence of samples of a given peroeptual piopraty of 

the signal; Ejecting the sequence of property samples to an auto-correlation function to 
20 obtain a sequence of auto-correlation values; comparing said auto-correlation values with 

respective thresholds; and repr^enting fhe results of said coni^arisons by respective bits of 

thefingeiprmt 

The method in accordance with the invention difSGsrs ftom the jnior art method 
iq. tbat the fingerprint bits are not derived firom the perceptual property of the signal as such, 
25 but from the auto-correlation of said property. The inveution is based on the recognition that 
a speed change of an audio signal causes energy levels in sub^bands to be shifted from one 
sTob-band to another^ and exploits the insight that the auto-corrdation function is shift 
invariant 

The auto-correlation function is well-known in titie continuous (time) domain. 
30 I3bwever. we are dealing h^e with a finite sequence of property values (e.g. energy levels). 
Therefore, in a practical embodiment of the method according to the invention, fhe desired 
auto-coixelation is approximated by correlating a sub-sequence of property samples with tiae 
complete sequence of property samples. 
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The auto-coirelation fenotion is preferably computed from a statisticaUy 
significant number of property samples, which is larger than the desired number of 
fingeipriat bits- Down-sampUng of the computed auto-conelation ftinotion is provided to 
obtain the desired number of auto-correlation values. 

BRIEP DESCRIPTION OF THE DRAWINGS 

Fig. 1 shows schematically a prior art airangemenl &r extracstime a fingeiptint 
fiom an audio signal. 

Pig. 2 shows schematically an airanganent for exttactijig a fiagoprint fiom an 
audio signal in acccndance with the invention. 

DESCRIPTION OF EMBODIMENTS 

Speed changes of an audio signal cause misaHgmnent in bolb Ibe ten^oral and 
ftequancy domain. Considering time misalignment, an audio exceipt subjected to a speed 
change o^ say, 2% causes the 250* fingetprint of this exeeipt to be extracted at the position 
of the 255* fingeipiAit of the original flxceipt. Fortunately, in Older to be 
fingerprints aie constructed such that they possess correlation along the time-axis. Therefore, ^ 
the BER (Wt errorrate) between the original exoeipt and the same eacceipt with a speed 
diange does not increase dramatically due to th© temporal misaHgomaot. 

The main pioblem caused by large speed changes is dierefore the fiequcncy . 
misalignment In the prior airangemenf; which is shown in Fig, 1, a 2% speedap win result in : 
a sealing of the fteqnency axis of ihe spectrum that is obtained with the Fourier Transfoim. 
For example a tone of SOOKe then results in a tone of 510Hz and a tone lOOOHs results in a 
tone of 1020Hz. After calculating the spectrum, the energy in logarithmically spaced bands is 
determined. Since the bands are logarifhmicaUy ^aced, the speed change results in a shift of 
energy from one band to the next band. The more energy that shifis from one band into the 
aexl; liie greater flis probabiKly that the extracted fingerprint bits axe erroneous. This is due to 
the feet that th© fingerprint bits are determined by energy dififerences of neighboring bands. 
It has been proposed to us© abrute fotce approach for identiJ^ing audio with 
large Gtpeed changes The brute force ^loach consists of storing fingerprints extracted at 
multiple speeds in me database, or querjdng the database with fingerprints that are extracted 
at mu%Ie speeds. The disadvantage of this method is that the search speed and/or storage 
requirements increase by a factor N, where N is the number of different speeds that is 
necessary for a certain plication. 
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Kg. 2 ^ows an arrangement for extracting a fingerprint fifom an audio signal 
in accordance with the invention. Jo. the Fig., the same reference numerals are tieed for 
fimctiona that are identical with or similar to the steps that have already been discussed with 
je&rence to Fig. 1. More particnlarly, flis audio signal is divided into overlapping frames 
(101) and the spectram of eatdi frame is computed (102). 

An aato-correlation step (202) is the fundamental step to achieve the better 



-8peed-K>hange^flicace.A^eed:Xfaange results in a pfaifr of the computed energy vector. 
Anto-oouelation has the property that it is shift invariant. As is generally known, the auto- 
correlatiMi p(x) of a continuous fimction fCt) is: 

However, we are not dealing here with an infinite continuous function f(t) but 
a finite sequence of property samples (energies). Ux order to compute the auto-conelation 
ffcom a stadsticaUy significant nmnba: of property samples, the energy of 512 sub^bands is 
computed (201) instead of 33. The bands are still logarithmic and still lie in the range of 300 
to 2000Hz. Thus the width of the bands is smaller. The auto-ootrelatlon is approximated by 
conelatmg a sub-sequ«ice of energies with tito oon^lete sequence. More apecifScally, the 
auto-conelation pM is calculated from the sub-band energy samples E(j) as follows: 

p[x] = 2^(^"^^^^('^*-^ forx=lA-^-M 

where N denotes the lengfii of tiie whole energy vector (here N«=«12)» M the length of the 
sub-sequence and K the position where the sub-sequence starts in the complete sequence. 
Typical settinp for M and K are respectively 64 and 96. To increase robttstaess, the resatttag 
auto-correlatiott values are optionally low pass filtered (203). THe low-pass filtered auto- 
correlation contains 512-64 = 448 values, whereas 33 ij^ut values are required for the 
2-dimensional filter (104) preceding the threshold operation (105). Therefore flie 448 anto^ 
correlation values are down-sampled to 33 values in a down-san5»ler (204). The resulting 
fingerprint is a 32-bit string for each frame. 

Although embodiments of the method and arrangement have been described 
wifli reference to audio fingerprint extraction* fiie invention is not restricted feereto. 
Applicant's fctemational Patent Application WO 02/065782, already cited above, discloses a 
video fingeiprint extracting meOiod in which the fingerprint is derived from the mean 
luminance values of image blocks into which each image is divided. In accordance with the 
invention, each image is now divided into a larger number of blocks, and a sub-set of the 
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blocks (a "super-block*') is cotrelated with the whole image for a number of positions of said 
super-blook. The obtained sequence of auto-correladon values is invariant to shifts of the 
video image. The sequence is optionally low-pass filtered and subsequently down-sampled. 

The invention can be summarized as follows. Fingerprints ate bit strings 
extr^ted from a media signal (e.g. an audio or video clip) to identify said media signal. 
Typically* they are derived from a perceptual property of the signal, for example, the spectral 
energy distribution of an audio fragment or the luminance distribution of a video imago. A 
method and arrangement for extracting a jSngerprint is here disclosed which is tobu^ with 
respect to shifts of the perceptual prop^. Such shifts occur inter alia when the fingetprint is 
derived from a logarithmically mapped ffpectral energy distribution of an audio signal and 
said audio signal is subjected to speed changes. In accordance with the invention, the 
fingerprint is not derived fi:om the perceptual propert/ as such> but its aiito-conrelation 
function* 
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CLAIMS: 



25 



1. A Jmethod of extracting a fingtsrpiintjtom 

of extracting fiom said media signal a sequcnoe of samples of a given petceptoal property of 
the signal, aniJleriving fiom said seqTjonoe"TOniayiseipimcei(50^^ 
chaiaoterized in ttiat the method comprises the steps of: 
S - subjecting the sequence ofjmipertysaDaples to an suto-cor^ 

sequence of auto-conelatian values; 

- comparing said auto-con^ation values with respective ttireSholds; and 

- repiesenting the results of said comparisons by respective bits of fte fingerprint 

IQ 2. A method as claimed in claim 1, wherein said stqjofsnlqecting the sequence 

of property samples to an auto-correlation fbnction compiises correlating asub-«equenoe Of 
property saogjles with liie con:q)lete sequence of property samiples. 

3. A mettiod as claimed in claim 1, wherein said step of subjecting the sequence 
15 of property samples to an auto-correlation function fiirther includes down-sampling the 

sequence of auto-coirelation values to obtain a desired number of auto-correlation values. 

4, A method as claimed in claim 1, wherein said step ofderiving&om said media 
signal a sequence of perceptual property values comprises dividing an audio signal into sub- 

20 bands and computing the ejier^esofsaid audio sob-bands. 

5^ A metiiod as claimed in claim I, wherein said step of deriving fi»m said media 

signal a sequence of perceptual properties comprises dividing an image into blocks and 
confuting the tmnlnanees of said image blocks. 



6. An apparatus for extracting a fingerprint from a media signal, comprising 

means fer deriving from saidmedla dgnal a sequence of samples of a given perceptual 
property of the signal, and means for deriving from said sequence a binary sequence 
constituting said fingerprint* characterized in tiiat the apparatus comprises: 
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- means for subjecting the sequence of property samples to an auto-correlatiott function to 
obtain a sequence of auto-coirelation values; 

- means for comparing gaict auto-coirclation values witk respective thresholds; and 
representing the results of said comparisons by respective bits of the fingerprint. 

7. A computer progiiam comprising instructions to cause a programmable device 

to perform the steps of; 

^ deriving ftom a received media signal a sequence of samples of a given perceptual 
property of the signal; 

- subjecting the sequence of property samples to an auto-correlation ftmction to obtain a 
sequence of auto-coirelation values; 

- comparing said antoHBorrelaiion values with respective thresholds; and 

- representing the results of said comparisons by respective bits of a fingerprint. 
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ABSTRACT: 



Fiagercndnts are l»t stiings tKxtiacted from amedia signal (e.g. an audio or 
video clip) to identify said media signaL Typically, fbsy are derived team a perceptual 
property of •ttie eigtial, for exampJv&>B spectral energy {iis£riK1&in)T^"^ifi3i&"ifragjiitonMr~ 
file luminance distribuJioa of a video ims^. A mefliod and arranganeot fixr extracting a 
5 fmgeipiint is here disclosed wblch is robust wilii respect to shifts of liie petceptual property. 
Such shifts occur inter alia \when the fingerprint is daived ftom a logazidnnically maf^ed 
spectral energy distribution of an audio signal and said audio signal is subjected to speed 
6hanges. In accordance with the invention, the fingerprint is not derived fixwn the percqrtual 
property as suoh, but its auto-cottelatiott function. 

10 
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